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Abstract 

Using a portfolio of stocks from the London Stock Exchange FTSE100 index (FTSE), 
we study both the time dependence of their correlations and the normalized tree 
length of the associated minimal spanning tree (MST). The first four moments of 
the distribution of correlations and lengths of the tree are examined in detail and 
differences in behaviour noted. For different economic groups and industries, clus- 
tering is evident. However comparing the classification used prior to 2006 with that 
introduced in January 2006 it is clear that the new classification, apart from one or 
two notable exceptions, is much more compatible with the clustering obtained by 
the MST analysis. We finally compare the MST for real data with that obtained 
for a synthetic random market. The latter tree would seem more like the structure 
found by Coronnello et al. for trees based on high frequency data. 

Key words: Econophysics, minimal spanning trees, sector analysis, stock 
correlations, random time series. 
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1 Introduction 



During the past decade, many physicists have used techniques of statistical 
physics and complexity to study economic and financial problems [1,2] and the 
associated networks [3]. Networks play a crucial role in these systems simply 
because trading activity generates networks. Studying stock networks, where 
the links represent similarities between stocks, can prove very valuable for 
portfolio optimization [4,5]. 

A challenging problem is the nature of stock time series and, in particular, the 
nature of their randomness [5,6,7]. Recently the theory of random matrices 
has proved helpful to characterize the time series [8,9]. In this paper we use 
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the concept of a minimal spanning tree (MST) proposed by Mantegna [10], to 
examine the correlations for stocks from the London FTSE100 index. 

We review briefly the method in the next section and explain how we choose 
the time parameters in Section 3. In Section 4, we use the approach to examine 
a portfolio of stocks selected from the London FTSE100 index. In Section 5, we 
examine in more detail the results for individual stock sectors. For different 
economic groups and industries, clustering derived in Section 4 is evident. 
However comparing in Section 6 the classification used prior to 2006 with that 
introduced in January 2006 it is clear that the new classification, apart from 
one or two notable exceptions, is much more compatible with the clustering 
obtained by the MST analysis. We finally compare in Section 7 the MST for 
real data with that obtained for a synthetic random and close with a few 
conclusions. 



2 Definitions 

Our main goal is to detect any underlying structure of a portfolio, such as 
clustering, or identification of key stocks. We start by computing the corre- 
lation coefficient between time series of log-returns of pairs of stocks. From 
these correlations we can compute a distance, for each pair, which is used for 
the construction of a network with links between stocks. 

The 100 most highly capitalized companies in the UK that comprise the Lon- 
don FTSE100, represent approximately 80% of the UK market. From these 
100 stocks, we study the time series of the daily closing price of N = 67 stocks 
that have been in the index continuously over a period of almost 9 years, 
starting in 2 nd August 1996 until 27 th June 2005. This equals 2322 trading 
days per stock. For our analysis of the time dependence of correlations and 
distances, time series are divided in small time windows, each with width T, 
that will overlap each other. The total number of windows depends on the 
window step length parameter, ST. 



2. 1 Correlations 

The correlation coefficient, between stocks i and j is given by: 



Pij 



(RiRj) - (Rj)(Rj) 



(1) 
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where Rj is the vector of the time series of log-returns, Ri(t) = In Pi(t) — 
lnPj(t — 1) the log-return and Piit) the daily closure price of stock i at day 
t. The notation (• • •) means an average over time ^ X^'t^ -1 • • where t is the 
first day and T is the length of our time series. 

This coefficient can vary between — 1 < p i3 < 1, where —1 means completely 
anti-correlated stocks and +1 completely correlated stocks. If pij = the 
stocks i and j are uncorrelated. The coefficients form a symmetric N x N 
matrix with diagonal elements equal to unity. 

Following Onnela et al. [5,11], we analyse the distribution of correlations in 
time. The first moment is the mean correlation: 



Other moments are similarly defined, the variance: 

A 2 = ^^E(P,-P) 2 , (3) 

the skewness: 

h= N ( N-^ U P "~ 7f ' <4) 



and the kurtosis: 



Evaluation of these moments for time windows of width T reveals the dynamics 
of the time series. The higher moments explain how the variance of correlation 
coefficients increase or decrease and how the skewness and kurtosis of the 
distribution changes. As we will see in Section 4, these moments show different 
behaviour after crashes or financial days with significant news. 



2.2 Distances 



The metric distance, introduced by Mantegna [10], is determined from the 
Euclidean distance between vectors, dij = |R; — Rj|. Here, the vectors Rj 
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are computed from Ri by subtracting the mean and dividing by the standard 
deviation: 

^ Rj— < Kj > 

Using the definition of correlation coefficient (eq. 1), pij = R« • Hj and noting 
that |Rj| = 1 it follows that: 

d?j = |Rj — Rj| 2 = l-E^-i| 2 ~l~ l-^i| 2 — 2Rj • Rj = 2 — 2pjj 

This relates the distance of two stocks to their correlation coefficient: 

dij = v/2(l - Pij ) (6) 

This distance varies between < < 2 where small values imply strong 
correlations between stocks. 

Following the procedure of Mantegna [10], this distance matrix is now used 
to construct a network with the essential information of the market. This 
network is a minimal spanning tree (MST) with N — 1 links connecting N 
nodes. The nodes represent stocks and the links are chosen such that the 
sum of all distances (normalized tree length) is minimal. We perform this 
computation using Prim's algorithm [12]. 

The normalized tree length, again following Onnela et al. [5,11], is given by 

i—E*,- P) 

where represents the MST. We also compute its higher moments (variance, 
skewness and kurtosis) and compare with the equivalent moments of the cor- 
relations. 



3 Determination of time parameters 

Depending on the length of the time series, the correlation coefficient between 
two stocks changes. Thus the distance between the two stocks will be different 
and the MST constructed will have different characteristics. In order to select 
appropriate values for the size of time windows (T) and window step length 
parameter (ST) we looked at early studies in this field. As shown previously 
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[5], the first and second moment of the correlations (mean correlation and 
variance) are strongly correlated. Taking this into account, we computed the 
value of this correlation as a function of T and ST (Figure 1). 
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Figure 1. Correlation between the first two moments of the correlation coefficient 
(mean, eq. 2 and variance, eq. 3) as a function of T and ST. The left graphic shows the 
correlation for different T as function of ST. The right graphic shows the correlation 
for ST = 1, as function of T. 



Clearly, for all T, the correlation between the two moments is not only posi- 
tive but strong, above 0.9 for T = 750, T = 1000 and T = 1250. Apart from 
T = 250 and T = 1750 there are only very small fluctuations for the corre- 
lation value, when we vary ST. Since when we increase ST, we are essentially 
removing points from our data, we decided to use the smallest value of ST (1 
day) in all of the following. 

Some events such as wars or crashes occurred during the period of study and 
are noted in Figure 2 that shows the absolute return of the FTSE Index. 
After these occurrences, which have a negative effect on stock values, all the 
stocks seems to follow each other, and both the correlation between them 
and mean correlation increase [13]. Now even if the correlation between mean 
correlation and variance increases when we increase T, the curve of mean 
correlation is based on less information. So the choice of T becomes something 
of a compromise. We choose T = 500 (2 years) and ST — 1 (1 day) to compute 
our moments for the correlations and distances. 
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Figure 2. Absolute return of the FTSE Index. Higher values indicate special days 
like beginning of wars or crashes. 1) Russian crash; 2) NASDAQ crash; 3) Beginning 
of US recession; 4) 11th September 2001; 5) Stock Market downturn of 2002; 6) 
Beginning of Iraq War. 



4 Analysis of Global Portfolio of FTSE100 index 



The time dependence of the mean correlation, the normalized tree length 
and the higher moments associated with these two quantities were studied 
for a time window, T = 500 and window step length, 5T — 1. Figure 3 
shows that the mean and variance of the correlation coefficients are highly 
correlated (0.779), the skewness and kurtosis are also highly correlated and 
the mean and skewness are anti-correlated. This implies that when the mean 
correlation increases, usually after some negative event in market, the variance 
increases. Thus the dispersion of values of the correlation coefficient is higher. 
The skewness is almost always positive, which means that the distribution 
is asymmetric, but after a negative event the skewness decreases, and the 
distribution of the correlation coefficients becomes more symmetric. 
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Figure 3. Mean (eq. 2), variance (eq. 3), skewness (eq. 4) and kurtosis (eq. 5) of the 
correlation coefficients. We use time windows of length T = 500 days and window 
step length parameter 5T = 1 day. For each moment ((T to tai — T)/ST =)1822 data 
points are shown. 

From Figure 4, we see how the normalized tree length changes with time. As 
expected from equation 6, when the mean correlation increases, the normalized 
tree length decreases and vice versa. Here, the mean and the variance of the 
normalized length of the tree are anti-correlated but the skewness and the 
mean continue to be anti-correlated. This means that after some negative 
event impacts the market, the tree shrinks, so the mean distance decreases [13], 
the variance increases implying a higher dispersion of the values of distance 
and the skewness, that is almost always negative, increases showing that the 
distribution of the distances of the MST gets more symmetric. 
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Figure 4. Mean (eq. 7), variance, skewness and kurtosis of the normalized tree length. 
We use time windows of length T = 500 days and window step length parameter 
ST = 1 day. For each moment there are 1822 points represented. 

Figure 5 is an enlarged version of the top graphic of Figure 4. This shows 
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the mean of the normalized tree length. As can be seen, after some of the 
events shown in Figure 2: the Russian Crash (October 1998), Dot-Com Bubble 
(March 2000), the beginning of US recession (March 2001), attack to the 
Twin Towers (11th September 2001), the Stock Market Downturn of 2002 
with accounting scandals (a long period between March 2002 and October 
2002) and the beginning of Iraq War (March 2003) the normalized tree length 
decreases. 
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Figure 5. Normalize tree length in function of time. Different external events affect 
the market. 1) Russian crash; 2) NASDAQ crash; 3) US recession; 4) 11th September 
2001; 5) Stock Market Downturn of 2002; 6) Iraq War. 



5 Sector Analysis 



A study of stocks such as the one considered here give us a insight into the 
behavior with time of the market. A specific study of each sector of the mar- 
ket is also of interest. We have studied two different classifications. First we 
consider the old classification for the London FTSE100, the FTSE Global Clas- 
sification System [14], that was in use from 2003 until the end of 2005. This 
classification groups the stocks into 102 Subsectors, 36 Sectors and 10 Eco- 
nomic Groups. Our portfolio is composed of 9 economic groups and 27 sectors: 
Resources (Mining, Oil & Gas), Basic Industries (Chemicals, Construction & 
Building Materials), General Industrials (Aerospace & Defense), Non-cyclical 
Consumer Goods (Beverages, Food Producers & Processors, Health, Personal 
Care & Household Products, Pharmaceuticals & Biotechnology, Tobacco), 
Cyclical Services (General Retailers, Leisure & Hotels, Media & Entertain- 
ment, Support Services, Transport), Non-cyclical Services (Food & Drug Re- 
tailers, Telecommunication Services), Utilities (Electricity, Utilities-Others), 
Financials (Banks, Insurance, Life Assurance, Investment Companies, Real 
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Estate, Speciality & Other Finance) and Information Technology (Software & 
Computer Services). 

The second classification studied is the new classification adopted by FTSE 
since the beginning of 2006, the Industry Classification Benchmark [15] cre- 
ated by Dow Jones Indexes and FTSE. This classification is divided into 10 
Industries, 18 Supersectors, 39 Sectors and 104 Subsectors. Our portfolio is 
composed of 10 industries and 28 sectors: Oil & Gas (Oil & Gas Producers), 
Basic Materials (Chemicals, Mining), Industrials (Construction & Materials, 
Aerospace & Defense, General Industrials, Industrial Transportation, Support 
Services), Consumer Goods (Beverages, Food Producers, Household Goods, 
Tobacco), Health Care (Health Care Equipment & Services, Pharmaceuticals 
& Biotechnology), Consumer Services (Food & Drug Retailers, General Re- 
tailers, Media, Travel & Leisure), Telecommunications (Fixed Line Telecom- 
munications, Mobile Telecommunications), Utilities (Electricity, Gas Water 
& Multiutilities), Financials (Banks, Nonlife Insurance, Life Insurance, Real 
Estate, General Financial, Equity Investment Instruments, Nonequity Invest- 
ment Instruments) and Technology (Software & Computer Services). 

For the old classification, the four economic groups with more stocks are the 
Non-cyclical Consumer Goods (13), Cyclical Services (21), Non-cyclical Ser- 
vices (6) and Financials (13). For each one of these groups we have repeated 
the above analysis of moments. 
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Figure 6. Mean and variance of the correlation coefficients for different economic 
groups, from the FTSE Global Classification System, in comparison with the global 
portfolio. 

As can be seen, not all the economic groups behave like the global portfolio. 



9 



Looking at the mean correlation, the Financial group is much more correlated 
than all the other groups. If we analyse the variance, the Financial and Non- 
cyclical Services groups loose the global property where the first two moments 
of the correlation coefficients are correlated. 

For the new classification, the four industries with more stocks are the Indus- 
trials (10), Consumer Goods (9), Consumer Services (18) and Financials (13). 
The mean and variance of the correlation coefficients for these industries are 
presented in Figure 7. 
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Figure 7. Mean and variance of the correlation coefficients for different industries, 
from the ICB, in comparison with the global portfolio. 

With this classification, all the industries loose the global property where the 
first two moments of the correlation coefficients are correlated. 



6 Minimal Spanning Trees 

For a topological view of the market we plot the MST with all the nodes 
(stocks) and links between them (distances) . For each classification we analyse 
the cluster formation of different economic groups (FTSE Global Classification 
System) or industries (ICB). 

Starting with the analysis due to the old classification we represent each eco- 
nomic group by a different symbol: Resources (■), Basic Industries (A), Gen- 
eral Industrials (♦), Non-cyclical Consumer Goods (□), Cyclical Services (A), 
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Non-cyclical Services (0), Utilities (•), Financials (gray o) and Information 
Technology (o). 



Figure 8, shows the MST with clusters of specific economic groups. Stocks 
from the Financial group are the backbone of this tree. It seems that all 
the other groups are connected to this one. The Financials, Resources, Utili- 
ties and General Industrials groups have all their stocks connected together. 
However for other groups divisions of stocks in sectors are apparent. For ex- 
ample, in the Non-cyclical Services, the Food & Drug Retailers are completely 
separated from the Telecommunication Services. Within Cyclical Services, the 
General Retailers, Media & Entertainment and Transports are 3 different clus- 
ters and the Support Services are isolated stocks connected to the Financial 
branch. In Non-cyclical Consumer Goods, the Health and Pharmaceuticals & 
Biotechnology form one cluster whereas Beverages, Tobacco, Food Producers 
& Processors and Personal Care & Household Products form another. 



Figure 8. Minimal Spanning Tree for the FTSE100 stocks. The length of the time 
series used to compute this tree is 2322 days. Each symbol correspond to a specific 
economic group from the FTSE Global Classification System. 

For the new classification, we represent each industry by a symbol: Oil & Gas 
(■), Basic Materials (A), Industrials (♦), Consumer Goods (gray □), Health 
Care (□), Consumer Services (A), Telecommunications (0), Utilities (•), Fi- 
nancials (gray o) and Technology (o). The MST is represented in Figure 9. 
The Financial industry has the same stocks as the one in the old classification, 
so it still works as the backbone of the tree. Financials, Oil & Gas, Utilities, 
Telecommunications and Consumer Goods have all their stocks connected to- 
gether. In the Consumer Services, the supersectors Retail and Media are two 
big clusters but they are not connected together. The other supersector from 
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this industry, the Travel & Leisure is disperse in the tree. Health Care indus- 
try is almost one cluster, but the stock SHP is not connected to the others. 
In the Industrials industry the stocks from the Support Services sector are 
always connected to the Financial industry. The other stocks in this sector are 
located in isolation at other points within the tree. 




The new classification adopted by FTSE in January 2006 clearly mimics much 
more closely the MST results as we can see from Figures 8 and 9. The imple- 
mentation of the new supersector groups ensures that apart from some notable 
exceptions stocks from the same supersector are now connected. It is possible 
that the few stocks separated from their main cluster are isolated by chance 
and over time they will join the appropriate clusters. However there could be 
other more fundamental reasons for their separation. Further study of both 
the dynamics of the MST correlations together with their economic indices 
(e.g. PE ratio, earnings, etc) that characterize the businesses concerned is 
necessary to resolve these issues. Nevertheless it seems clear from this analysis 
that the MST approach is one that should complement current approaches to 
the development of stock taxonomy. 

Coronnello et al. [16] have studied the topology of the London FTSE using 
daily and intra-day data for N = 92 stocks, from year 2002. The MST for 
daily data looks quite different from the one shown in Figure 8. Using our 
data and studying the MST for each year, we can see that for 2002, the main 
hubs of the MST are BARC, RBS and SHEL, each of them with 11, 8 and 7 
links, respectively. The simple inclusion of BARC in our study (not included 
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in the portfolio of [16]) gives a quite different network. But the main clusters 
are the same in the two studies. 



7 Numerical Simulations of MST 

In order to examine further the underlying nature of the time series we use 
now random time series computed from two different models. Modeling the 
log-returns as random numbers from a specific distribution, we can compute 
the correlations, distances and trees for this random series. As in [6,7], our 
first approach was to consider the returns as random variables derived from a 
Gaussian distribution. So, using the real mean value, fa of each real time series 
and the specific real variance, cTj we compute random series for our random 
market: 

n{t) = fa + €i(t) 

where e^t) is the stochastic variable from a Gaussian distribution with vari- 
ance cjj. The MST for this random time series is represented in Figure 10. 




Figure 10. Minimal Spanning Tree for 67 random time series using random variables 
from a Gaussian distribution. 

This MST shows no clustering, the stocks are distributed randomly in the 
network and there is no stock with more than 4 links. To create random time 
series with more real characteristics we introduce a control term (the return 
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of FTSE Index) and we compute a one- factor model [6,7]: 

Ti{t) = OH + (3iRm(t) + 6i(t) 

where Q!j and $ are parameters estimated by the least square method from 
our data, R m (t) is the market factor (return of FTSE Index) and e$(t) is the 
stochastic variable from a Gaussian distribution with variance cij. The two 
factors are calculated as: 



where cov(. ..,...) is the covariance, a\ is the variance of the returns of 
FTSE Index and Ri(t) is the returns of real stock i. 

The MST for random time series created using this model is shown in Figure 



Figure 11. Minimal Spanning Tree for 67 random time series using the one- factor 
model. 

This network is completely different from the previous random network. Now 
we see that the stocks from the Financial group (gray o), are all linked together. 
As in the MST for real data (figures 8 and 9) they act as the backbone of the 
network. However, the presence of 6 nodes with up to 13 links differs from 
the topology of real data. A model to describe the time series of log-return 



an = (Ri{t)) - fr(R M (t)) 



cov(Ri(t),R M (t)) 



11. 
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seems to lie somewhere between a completely random model and an one- 
factor model. The completely random model does not give much information. 
However, apart from producing the Financial group backbone, the one-factor 
model shows similar topology with the MST shown by Coronnello et al. [16] 
for intra-day data. This suggests that the network formed by intra-day data 
are not fully formed, information as not yet sufficiently spread and correlations 
are not yet developed as they are found in the networks formed using daily 
data. 



8 Conclusions 

In summary, we have studied the correlations between time series of log-return 
of stocks from a FTSE100 portfolio and examine how these change with both 
the size of the time series and time. The mean correlation increases after 
external crises, and different moments feature correlations or anti-correlations 
as a result. For the study of specific stocks of each sector we conclude that 
some sectors have different feedback to the external events. 

From the MST we can see that some stocks from the same sector cluster to- 
gether. This does not happen with all stocks from specific economic groups or 
industries. It would seem from the MST analysis that the new FTSE classifi- 
cation introduced in January 2006 offers a more logical clustering of the differ- 
ent stocks as opposed to the previous classification scheme. However from the 
MST it is clear that anomalies are still present that could affect the building 
of optimum portfolios. 

The structure of trees generated from random time series differs significantly 
from real markets. Furthermore there appears to be no obvious hub node. 
On the other hand the one-factor model produces a MST where we can see 
hubs with many links. This kind of structure is close to that obtained using 
intra-day data. In future papers we shall assess changes in the tree structures 
using one-factor model Levy distributions. We shall also look at this issue 
by deriving analytic expressions linking the moments of correlations to the 
moments of the lengths. 



Acknowledgements 

This publication has emanated from research conducted with the financial 
support of Science Foundation Ireland (04/BRG/PO251). The authors also 
acknowledge the help of COST (European Cooperation in the Field of Scien- 
tific and Technical research) Action P10. 



15 



References 



[I] R. N. Mantegna and H. E. Stanley, An Introduction to Econophysics: 
Correlations and Complexity in Finance. Cambridge University Press, 
Cambridge (2001) 

[2] J.-P. Bouchaud and M. Potters, Theory of Financial Risk and Derivative 
Pricing. Cambridge University Press, Cambridge (2003) 

[3] S. N. Dorogovtsev and J. F. F. Mendes, Evolution of Networks: From Biological 
Nets to the Internet and WWW. Oxford University Press, Oxford (2003) 

[4] V. Tola, F. Lillo, M. Gallegati and R. N. Mantegna, preprint |physics/ 0507006 

[5] J.-P. Onnela, A. Chakraborti, K. Kaski, J. Kertesz and A. Kanto, Phys. Rev. E 
68, 056110 (2003) 

[6] G. Bonanno, G. Caldarelli, F. Lillo and R. N. Mantegna, Phys. Rev. E 68, 
046130 (2003) 

[7] G. Bonanno, G. Caldarelli, F. Lillo, S. Micciche, N. Vandewalle and R. N. 
Mantegna, Eur. Phys. J. B 38, 363 (2004) 

[8] L. Laloux, P. Cizeau, J.-P. Bouchaud and M. Potters, Phys. Rev. Lett. 83, 1467 
(1999) 

[9] V. Plerou, P. Gopikrishnan, B. Rosenow, L. A. N. Amaral and H. E. Stanley, 
Phys. Rev. Lett. 83, 1471 (1999) 

[10] R. N. Mantegna, Eur. Phys. J. B 11, 193 (1999) 

[II] J.-P. Onnela, A. Chakraborti, K. Kaski and J. Kertesz, Eur. Phys. J. B 30, 285 
(2002) 

[12] R. C. Prim, Bell System Technical Journal 36, 1389 (1957) 

[13] J.-P. Onnela, A. Chakraborti, K. Kaski and J. Kertesz, Physica A 324, 247 
(2003) 



[14] http://www.ftse.com 



[15] |http:// www.icbenchmark.com/ 



[16] C. Coronnello, M. Tumminello, F. Lillo, S. Micciche and R. N. Mantegna, Acta 
Physica Polonica 36 (9), 2653 (2005) 



16 



